NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Modular Construction and Optimization of the UZP Sparse Format for SpMV on CPUs

Rodríguez-Iglesias, Alonso; Tongli, Santoshkumar T; Tucker, Emily; Pouchet, Louis-Noël; Rodríguez, Gabriel; Touriño, Juan (June 2025, Proceedings of the ACM on programming languages)

Sparse data structures are ubiquitous in modern computing, and numerous formats have been designed to represent them. These formats may exploit specific sparsity patterns, aiming to achieve higher performance for key numerical computations than more general-purpose formats such as CSR and COO. In this work we present UZP, a new sparse format based on polyhedral sets of integer points. UZP is a fexible format that subsumes CSR, COO, DIA, BCSR, etc., by raising them to a common mathematical abstraction: a union of integer polyhedra, each intersected with an ane lattice. We present a modular approach to building and optimizing UZP: it captures equivalence classes for the sparse structure, enabling the tuning of the representation for target-specifc and application-specifc performance considerations. UZP is built from any input sparse structure using integer coordinates, and is interoperable with existing software using CSR and COO data layout. We provide detailed performance evaluation of UZP on 200+ matrices from SuiteSparse, demonstrating how simple and mostly unoptimized generic executors for UZP can already achieve solid performance by exploiting Z-polyhedra structures.
more » « less
Free, publicly-accessible full text available June 16, 2026
Formal Verification of Source-to-Source Transformations for HLS

https://doi.org/10.1145/3626202.3637563

Pouchet, Louis-Noël; Tucker, Emily; Zhang, Niansong; Chen, Hongzheng; Pal, Debjit; Rodríguez, Gabriel; Zhang, Zhiru (March 2024, International Symposium on Field Programmable Gate Arrays (FPGA'2024))
Custom High-Performance Vector Code Generation for Data-Specific Sparse Computations

https://doi.org/10.1145/3559009.3569668

Horro, Marcos; Pouchet, Louis-Noël; Rodríguez, Gabriel; Touriño, Juan (October 2022, Proceedings of the ACM/IEEE International Conference on Parallel Architectures and Compilation Techniques (PACT'22))

Full Text Available
Representing Integer Sequences Using Piecewise-Affine Loops

https://doi.org/10.3390/math9192368

Rodríguez, Gabriel; Pouchet, Louis-Noël; Touriño, Juan (October 2021, Mathematics)

A formal, high-level representation of programs is typically needed for static and dynamic analyses performed by compilers. However, the source code of target applications is not always available in an analyzable form, e.g., to protect intellectual property. To reason on such applications, it becomes necessary to build models from observations of its execution. This paper details an algebraic approach which, taking as input the trace of memory addresses accessed by a single memory reference, synthesizes an affine loop with a single perfectly nested reference that generates the original trace. This approach is extended to support the synthesis of unions of affine loops, useful for minimally modeling traces generated by automatic transformations of polyhedral programs, such as tiling. The resulting system is capable of processing hundreds of gigabytes of trace data in minutes, minimally reconstructing 100% of the static control parts in PolyBench/C applications and 99.99% in the Pluto-tiled versions of these benchmarks. As an application example of the trace modeling method, trace compression is explored. The affine representations built for the memory traces of PolyBench/C codes achieve compression factors of the order of 106 and 103 with respect to gzip for the original and tiled versions of the traces, respectively.
more » « less
Full Text Available
PolyBench/Python: benchmarking Python environments with polyhedral optimizations

https://doi.org/10.1145/3446804.3446842

Abella-González, Miguel Á.; Carollo-Fernández, Pedro; Pouchet, Louis-Noël; Rastello, Fabrice; Rodríguez, Gabriel (February 2021, CC 2021: 30th ACM SIGPLAN International Conference on Compiler Construction)
null (Ed.)
Python has become one of the most used and taught languages nowadays. Its expressiveness, cross-compatibility and ease of use have made it popular in areas as diverse as finance, bioinformatics or machine learning. However, Python programs are often significantly slower to execute than an equivalent native C implementation, especially for computation-intensive numerical kernels. This work presents PolyBench/Python, implementing the 30 kernels in PolyBench/C, one of the standard benchmark suites for polyhedral optimization, in Python. In addition to the benchmark kernels, a functional wrapper including mechanisms for performance measurement, testing, and execution configuration has been developed. The framework includes support for different ways to translate C-array codes into Python, offering insight into the tradeoffs of Python lists and NumPy arrays. The benchmark performance is thoroughly evaluated on different Python interpreters, and compared against its PolyBench/C counterpart to highlight the profitability (or lack thereof) of using Python for regular numerical codes.
more » « less
Full Text Available
Generating piecewise-regular code from irregular structures

https://doi.org/10.1145/3314221.3314615

Augustine, Travis; Sarma, Janarthanan; Pouchet, Louis-Noël; Rodríguez, Gabriel (June 2019, PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation)

Irregular data structures, as exemplified with sparse matrices, have proved to be essential in modern computing. Numerous sparse formats have been investigated to improve the overall performance of Sparse Matrix-Vector multiply (SpMV). But in this work we propose instead to take a fundamentally different approach: to automatically build sets of regular sub-computations by mining for regular sub-regions in the irregular data structure. Our approach leads to code that is specialized to the sparsity structure of the input matrix, but which does not need anymore any indirection array, thereby improving SIMD vectorizability. We particularly focus on small sparse structures (below 10M nonzeros), and demonstrate substantial performance improvements and compaction capabilities compared to a classical CSR implementation and Intel MKL IE's SpMV implementation, evaluating on 200+ different matrices from the SuiteSparse repository.
more » « less
Full Text Available
Effect of Distributed Directories in Mesh Interconnects

https://doi.org/10.1145/3316781.3317808

Horro, Marcos; Kandemir, Mahmut T.; Pouchet, Louis-Noël; Rodríguez, Gabriel; Touriño, Juan (June 2019, DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019)

Recent manycore processors are kept coherent using scalable distributed directories. A paramount example is the Xeon Phi Knights Landing. It features 38 tiles packed in a single die, organized into a 2D mesh. Before accessing remote data, tiles need to query the distributed directory. The effect of this coherence traffic is poorly understood. We show that the apparent UMA behavior results from the degradation of the peak performance. We develop ways to optimize the coherence traffic, the core-to-core-affinity, and the scheduling of a set of tasks on the mesh, leveraging the unique characteristics of processor units stemming from process variations.
more » « less
Full Text Available

Search for: All records